library(dplyr)
Registered S3 method overwritten by 'dplyr':
method from
print.rowwise_df
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(readr)
library(ggplot2)
histologies_df <- read_tsv(file.path("..", "..", "data",
"pbta-histologies.tsv"))
Parsed with column specification:
cols(
.default = col_character(),
OS_days = [32mcol_double()[39m,
age_last_update_days = [32mcol_double()[39m,
normal_fraction = [32mcol_double()[39m,
tumor_fraction = [32mcol_double()[39m,
tumor_ploidy = [32mcol_double()[39m,
molecular_subtype = [33mcol_logical()[39m
)
See spec(...) for full column specifications.
221 parsing failures.
row col expected actual file
2606 molecular_subtype 1/0/T/F/TRUE/FALSE Group3 '../../data/pbta-histologies.tsv'
2607 molecular_subtype 1/0/T/F/TRUE/FALSE Group4 '../../data/pbta-histologies.tsv'
2608 molecular_subtype 1/0/T/F/TRUE/FALSE Group3 '../../data/pbta-histologies.tsv'
2609 molecular_subtype 1/0/T/F/TRUE/FALSE Group3 '../../data/pbta-histologies.tsv'
2610 molecular_subtype 1/0/T/F/TRUE/FALSE Group3 '../../data/pbta-histologies.tsv'
.... ................. .................. ...... .................................
See problems(...) for more details.
chordoma_samples <- histologies_df %>%
filter(short_histology == "Chordoma") %>%
pull(Kids_First_Biospecimen_ID)
# TODO: update to use consensus file and likely a more permissive version of
# the focal-cn-file-preparation annotation step
# Here, we're using an older version of the annotated files that used exons
focal_cn_df <- read_tsv("https://github.com/AlexsLemonade/OpenPBTA-analysis/raw/fa214291713575be7fd20c92374b268870f4173f/analyses/focal-cn-file-preparation/results/cnvkit_annotated_cn_autosomes.tsv.gz")
Parsed with column specification:
cols(
biospecimen_id = [31mcol_character()[39m,
status = [31mcol_character()[39m,
copy_number = [32mcol_double()[39m,
ploidy = [32mcol_double()[39m,
ensembl = [31mcol_character()[39m,
gene_symbol = [31mcol_character()[39m,
cytoband = [31mcol_character()[39m
)
chordoma_loss <- focal_cn_df %>%
filter(biospecimen_id %in% chordoma_samples, gene_symbol == "SMARCB1")
chordoma_loss
#we need to include the sample_id field from pbta-histologies.tsv in the final table (field will allow #us to map between RNA-seq (e.g., SMARCB1 expression values) and WGS data (e.g., SMARCB1 focal copy #number status) from the same event for a given individual).
#To get the SMARCB1 jitter plot in the photo here #250 (comment), you will first need to read in the #collapsed expression data
expression_data <- read_rds(file.path("..", "..", "data", "pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds"))
expression_data